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Abstract 
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1      Introduction 

A  number  of  authors  have  recently  studied  stochastic  equicontinuity  for  unbounded 
empirical  processes  for  heterogeneous  and  dependent  observations,  e.g.,  Andrews  [2], 
DeJong  [11],  and  Hansen  [16].  A  review  is  given  by  Andrews  [3].  In  this  paper  we 
first  introduce  a  weighted  sequential  empirical  process  and  then  study  its  stochastic 
equicontinuity.  A  sequential  empirical  process  involves  partial  sums  of  stochastic  pro- 
cesses. As  such,  an  empirical  process  in  the  usual  sense  may  be  viewed  as  a  special 
case  of  a  sequential  empirical  process.  Andrews  [2]  shows  how  empirical  process  the- 
ory can  be  used  in  various  applications  in  econometrics,  particularly  in  establishing 
asymptotic  properties  of  econometric  estimators  and  test  statistics.  We  show  in  this 
paper  that  sequential  empirical  processes  arise  naturally  in  the  context  of  structural 
change.  Therefore  they  will  be  useful  for  testing  parameter  constancy  in  econometric 
models. 

Sequential  empirical  processes  and  their  related  stochastic  equicontinuity  are  first 
discussed  by  Bickel  and  Wichura  [9],  but  with  no  particular  motivation.  In  this  paper, 
we  consider  a  weighted  sequential  empirical  process.  The  consideration  of  a  weighted 
version  is  motivated  by  structural  change  in  linear  regressions,  as  explained  in  the 
next  section.  The  weights  are  the  regression  variables  or  a  set  of  instrument  variables, 
so  that  the  process  involves  unbounded  summands  in  contrast  to  the  one  considered 
by  Bickel  and  Wichura.  Also  regression  variables  are  generally  stochastic  and  serially 
correlated  (e.g.,  autoregressive  models);  thus  we  have  a  stochastically  weighted  and 
dependent  empirical  process.  This  kind  of  sequential  empirical  process  is  not  discussed 
much  in  the  literature.  Its  stochastic  equicontinuity  may  not  be  directly  derived 
from  the  existing  results.  In  addition,  the  contemporaneous  treatment  of  dependent 
empirical  processes  demands  highly  technical  and  abstract  analysis.  In  this  paper,  we 
provide  an  elementary  proof  for  stochastic  equicontinuity.  The  argument  is  a  direct 
extension  of  that  of  Billingsley  [10].  Because  of  the  concrete  structure  of  the  process, 
we  are  able  to  derive  concrete  sufficient  conditions  for  its  stochastic  equicontinuity.  In 


addition,  an  elementary  proof  is  instructive. 

Application  of  the  result  to  structural  changes  in  linear  regressions  is  briefly  dis- 
cussed. The  analysis  is  applicable  to  changes  in  a  single  equation  of  a  simultaneous 
equations  system.  Tests  based  on  a  weighted  sequential  empirical  process  can  detect 
changes  in  regression  parameters  and  changes  in  variance.  Most  importantly,  these 
tests  can  detect  changes  in  error  distribution  functions  that  are  not  necessarily  man- 
ifested in  the  form  of  changing  variances.  In  other  words,  the  test  is  able  to  test 
changes  in  higher  moments  of  the  data,  whereas  the  CUSUM,  fluctuation,  and  Wald 
type  of  tests  may  not  be  able  to. 

2      Sequential  empirical  process 

For  a  given  sequence  of  random  variables  Z\,  Zi, ...,  Z„,  the  sequential  empirical  process 
of  this  sequence  is  defined  as 

Bn(k,z)  =  4=Ei7(^  <  z)-F(z)},      (k  =  1,2, :..,u) 

Vn  t=i 
where  F(z)  is  the  distribution  function  of  Z,.  Bickel  and  Wichura  [9]  first  introduce 
and  establish  the  stochastic  equicontinuity  of  Bn  for  i.i.d.  Z\s.  This  is  a  two-parameter 
process.  The  summand  in  the  process  is  bounded  by  1.  We  shall  consider  a  weighted 
sequential  empirical  process  which  is  not  bounded  and  may  be  dependent.  Weighted 
sequential  empirical  processes  arise  naturally  in  the  context  of  structural  change.  Con- 
sider the  following  linear  regression  model: 

yt  =  x'tP  +  et     (*  =  l,2,....,n)  (1) 

where  xt  is  a  vector  of  explanatory  variables,  /?  is  an  unknown  vector,  and  et  are  i.i.d. 
disturbances  with  a  continuous  distribution  function.  Define  the  vector  process: 

n 
Sn{z)  -  YlxtI{Vt  <  Z),  -OC  <  Z  <   OO. 

In  terms  of  parameter  inference,  observing  this  process  is  equivalent  to  observing  the 
whole  data  set,  with  probability  one.    The  vector  process  simply  orders  the  original 


data  set  according  to  the  magnitude  of  the  dependent  variable.  In  this  sense,  Sn(z) 
is  a  sufficient  statistic  for  0  (may  be  a  "sufficient  process"  is  more  appropriate).  Now 
suppose  the  true  model  obeys  a  two-regime  regression: 

yt  =  x't0l+et     (i  =  l,2,...,n1)  (2a) 

yt  =  x't02  +  et     (*  =  m  +  l,...,n).  (2b) 

To  estimate  0\  and  02,  it  is  important  to  know  to  which  regime  a  given  observation 
belongs.  The  process  Sn(z)  does  not  convey  this  information.  It  is  thus  not  a  sufficient 
statistic  (process)  for  0\  and  02.  However, 

S1n(z)  =  f:xtI(yt<z) 
t=i 

and 

S2n(z)=    £    xtI(yt<z) 

t=m+i 

are  jointly  sufficient  for  0i  and  02,  because  S„  only  orders  the  first  n^  observations 
within  themselves,  and  similarly  S%  orders  the  last  n  —  rii  observations  within  them- 
selves. However,  when  the  regime-switching  point  n\  is  unknown,  then  (5*,  S%)  is  no 
longer  a  sufficient  statistic.  In  this  case,  sufficient  statistics  are  given  by  all  pairs  of 
(5*,  S*)  for  rii  =  1,2,  ..,n.  Introduce 

k 
Sn(k,z)  =  ^xtI(yt  <  z), 

t=i 

which  is  a  sequential  empirical  process  up  to  normalization  and  centering.  For  k  =  ri\, 
the  process  Sn(k,z)  is  simply  S*(z)  and  Sn(n,z)  -  Sn(nx,z)  is  simply  S*(z).  Thus, 
when  rii  is  unknown,  sufficient  statistics  are  given  by  Sn(k,z),  k  =  l,...,n. 

As  an  example  of  a  dependent  sequential  empirical  process,  consider  a  time  series 
regression: 

yt  =  fi  +  piyt-i  +  ••■  pPyt-P  +  z't6  +  et.  (3) 

Denote  xt  =  (l,yt-\,  ••■,  Vt—p,  z[)'.  The  sequential  empirical  process  is  defined  as  before. 
But  xt  is  serially  correlated,  yielding  a  dependent  sequential  empirical  process. 


The  above  process  Sn(k,z)  only  describes  the  data,  it  does  not  incorporate  the 
model.  To  do  this,  we  modify  the  process  to 

k 

Sn(k,  2, 7)  =  51  xtI(yt  <  z  +  x'tj). 
t=\ 

A  linear  structure  is  introduced  in  the  above  process.  Also  note,  under  model  (1), 

Sn(k,z,/3)  =  J2xtH£t  <  z)- 
t=l 

However,  the  parameter  /?  is  unknown,  so  Sn(k,  z,  /?)  is  not  observable  or  computable. 

To  solve  this  problem  one  can  replace  f3  by  an  estimator,  $.  If  we  put  it  =  yt  —  x't$, 

then 

k 

Sn(k,zJ)  =  ^2xtI(et<z) 

t=i 

which  can  be  considered  an  estimated  sequential  empirical  process.  This  process 
embodies  the  model  and  data.  The  estimated  parameter  can  be  obtained  from  the  first 
k  observations  or  from  the  whole  sample.  In  the  former  case,  a  sequence  of  estimators 
is  needed,  which  may  be  obtained  by  recursive  estimation.  In  our  application,  we  use 
a  whole-sample  estimator.  The  test  for  parameter  constancy  is  based  on  the  process 
Sn(k,z, $),  see  Section  4.  Tests  based  on  the  weighted  sequential  empirical  process 
are  more  powerful  than  those  based  on  a  non-weighted  process,  as  pointed  out  by  Bai 
[8].  To  study  the  asymptotic  property  of  the  test,  we  need  the  weak  convergence  of 
Sn(k,  z,  /5),  whose  convergence  in  turn  depends  on  the  weak  convergence  of  Sn(k,  z,  0). 
The  weak  convergence  of  the  latter  is  our  focus.  By  normalizing  and  centering  of  Sn, 
we  define  the  vector  process 

[ns] 

Hn(s,  z)  =  (X'X)-^  £  xt{I(et  <z)-  F(z)}  (4) 

t=i 

where  X  =  (xi,X2,  ...,xn)'.  In  the  next  section,  we  study  the  stochastic  equicontinuity 

of  Hn  and  its  weak  convergence. 

The  weighting  vector  xt  does  not  have  to  be  the  regression  variables.  Generally,  it 

can  be  a  set  of  instrumental  variables.   Consider  a  single  equation  in  a  simultaneous 


equations  system: 

yt  =  z't0  +  et  (5) 

where  zt  includes  other  endogenous  variables  so  that  et  is  correlated  with  zt.  If  one 
uses  Zt  in  place  of  xt  in  the  definition  of  Hn,  then  the  process  will  not  have  a  proper 
limit  because  the  summands  do  not  have  zero  mean.  Now  suppose  xt  is  a  vector 
of  instruments  that  is  correlated  with  zt  but  independent  of  et.  Then  we  can  still 
consider  the  weak  convergence  of  (4).  We  may  call  this  process  the  instrumental- 
variable  weighted  sequential  empirical  process.  Tests  based  on  a  instrumental- variable 
weighted  process  will  have  nontrivial  local  power  only  if  X  is  a  set  of  valid  instruments 
in  the  sense  that  plim(X'X/n)  and  p\\m(X'Z/n)  have  full  column  rank  and  X  is 
uncorrected  with  £t,  where  Z  =  (z[,...,z'n)'. 

3      Stochastic  Equicontinuity 

To  derive  the  stochastic  equicontinuity  for  weighted  sequential  empirical  processes,  we 
impose  the  following  conditions  (their  implications  are  discussed  below). 
(A.l)  The  random  variables  et  are  i.i.d.  with  a  continuous  distribution  function  F . 
(A. 2)  The  disturbances  et  are  independent  of  all  contemporaneous  and  past  regressors. 
(A. 3)  The  regressors  {xt}  form  a  triangular  array  (for  simplicity  the  dependence  on  n 
is  suppressed)  and  satisfy; 

plim-  22  xtx't  =  Q(s)     for  s  €  [0, 1], 
nt=i 

where  Q(s)  is  a  p  x  p  nonrandom  positive  definite  matrix  for  s  >  0  and  Q(0)  =  0. 
The  convergence  is  necessarily  uniform  in  6,  because  the  sum  is  "monotonic"  in  s. 

(A.4) 

max  n-1/2||a:t||  =  ojl). 
l<t<n  "    "  pv    ' 

(A. 5)  For  every  fixed  Si,  there  exists  a  random  variable  Zn  (may  depend  on  s\)  such 
that,  for  all  s  >  si, 

-    E    \\*t\\  <  (s  -  Sl)Zn 

t=[nsi] 


with  probability  one.  In  addition,  the  tail  probability  of  Zn  satisfies,  for  some  p  >  0 
and  M  <  oo: 

P(\Zn\  >C)<  M/C2(1+p). 

(A. 6)  There  exist  7  >  1,  a  >  1  and  K  <  00  such  that  for  all  0  <  u  <  v  <  1,  and  for 

all  n, 

-  J2  E(x'txty  <K{v-u)   and     E(-  £  x'txtV  <  K{y  -  u)a, 
n  i<t<j  n  i<t<j 

where  i  =  [nit],  j  =  [nv]. 

Assumption  (A. 2)  allows  for  dynamic  models,  e.g.,  autoregressive  model  (3).  As- 
sumption (A. 3)  allows  for  trending  regressors  written  in  the  form  xt  =  g{t/n),  for  some 
function  g.  This  assumption  is  often  maintained  in  recursive  estimation  for  construct- 
ing CUSUM  tests,  see,  e.g.,  Ploberger,  Kramer,  and  Kontrus  [18].  Assumption  (A. 4) 
is  conventional  for  linear  models  and  is  used  for  obtaining  normality.  Assumptions 
(A. 5)  and  (A. 6)  are  unique  for  our  problem.  They  are  the  main  assumptions  for  the 
equicontinuity  of  the  sequential  empirical  process  Hn.  In  (A. 5),  Zn  may  be  taken  to 
be  max*  k~l  Y?t=i  \\xt\\  provided  the  condition  on  the  tail  probability  is  also  satisfied, 
where  i  =  [nsi]  is  fixed.  It  is  generally  impossible,  however,  to  choose  Zn  =  Op(l)  uni- 
formly in  both  s  and  sj.  If  E(x'txt)2  <  M  for  all  t,  then  (A. 6)  is  satisfied  with  7  =  2 
and  a  =  2,  because  £(£t-«  x'txt)2  <  {J2i=i[E(xtxt)2]1/2}2  by  the  Cauchy-Schwarz 
inequality.  Finally,  when  regressors  xt  are  bounded  (A.4)-(A.6)  will  be  satisfied. 

Let  T  =  [0, 1]  x  1Z  be  the  parameter  set  with  metric  p({r,  y},  {5,  z})  =  \s  — 
r\  +  \F{z)  —  F(y)\.  Let  D[T]  be  the  set  of  functions  defined  on  T  that  are  right 
continuous  and  have  left  limits.  We  equip  D[T]  with  the  Skorohod  metric  (Pollard 
[19]).  The  vector  process  Hn  belongs  to  the  Cartesian  product  space  D[T]P,  equipped 
with  the  corresponding  product  Skorohod  topology.  The  weak  convergence  of  Hn 
in  the  space  D[T]P  is  implied  by  the  finite  dimensional  convergence  together  with 
stochastic  equicontinuity.  The  latter  condition  also  implies  the  sample  path  of  the 
limiting  process  of  Hn  will  be  continuous  with  probability  one. 


Theorem  1  Under  assumptions  (A.l),  (A. 2),  (A. 5),  and  (A. 6),  the  process  Hn  is 
stochastically  equicontinuous  on  (T,p).  That  is  for  any  e  >  0,  r\  >  0,  there  exists  a 
8  >  0  such  that  for  large  n, 

P[  sup  \\Hn(r,y)-Hn(s,z)\\>T})  <t 

where  [8]  =  {{t1,t2);t1  =  {r,y),r2  =  (s,z),p(tut2)  <  8}  with  [8]  C  T  x  T. 

When  xt  =  1  for  all  t,  the  equicontinuity  of  Hn  is  implied  by  the  result  of  Bickel 
and  Wichura  [9].  When  {(xt,et)}  are  independent,  equicontinuity  can  also  be  proved 
by  extending  the  method  of  Bickel  and  Wichura.  It  is  the  statistical  dependence 
in  data  that  requires  a  different  framework  of  proof.  Dependence  in  data  could  be 
a  big  obstacle  for  proving  equicontinuity.  Indeed,  the  powerful  tool  of  symmetriza- 
tion  depends  heavily  on  the  independence  assumption,  although  it  is  extended  to 
m-dependent  processes  by  Andrews  [2].  Recent  development  explores  ways  of  getting 
around  the  difficulty.  And  there  are  many  successful  results;  examples  are  Andrews  [1] 
for  a  smoothed  class  of  functions  of  near-epoch  variables,  DeJong  [11]  for  unbounded 
strong  mixing  processes,  Doukhan,  Massart  and  Rio  [12]  for  unbounded  absolutely 
regular  processes,  and  Hansen  [16]  for  unbounded  mixingales.  A  review  is  provided 
by  Andrews  [3],  also  see  Andrews  and  Pollard  [5]  for  a  bounded  strong  mixing  se- 
quence. For  the  proposed  weighted  sequential  empirical  process  (4),  the  summands 
are  unbounded  martingale  differences  (for  each  fixed  z).  It  seems  that  the  method 
of  Levental  [17]  may  be  used.  The  conditions  of  Levental,  however,  are  not  primitive 
and  only  work  well  for  bounded  martingales,  though  it  is  noted  by  Hansen  [16]  that 
Levental's  method  may  be  extended  to  unbounded  ones.  Hansen's  own  approach,  as 
pointed  out  by  the  author  himself,  does  not  work  well  for  indicator  types  of  functions. 
We  shall  offer  an  elementary  proof.  Since  our  purpose  is  not  to  cover  as  wide  a  range 
of  processes  as  possible,  our  conditions  are  specific  and  primitive.  Given  the  concrete 
structure  of  our  process,  we  feel  an  elementary  argument  is  more  instructive.  We  are 
also  interested  in  the  limiting  process. 


The  proof  is  provided  in  the  appendix.  In  the  proof,  we  focus  on  the  vector  process 

[«] 
Yn(s,u)  =  n-V^Xtim  <u)-u}  (6) 

t=\ 

where  U\,  U2,  •••,  Un  are  i.i.d.  uniform  on  [0,1]  with  Ut  independent  of  Xj  for  j  <  t. 
Effectively,  we  replace  et  by  F(et)  which  is  uniform  on  [0,1].  So  that  Hn(s,z)  = 
(X'Xln)-xl2Yn{s,F{z)).  By  assumption,  (X'X/n)  -?-+  Q(l),  a  positive  definite  ma- 
trix, so  Yn  and  Hn  are  equivalent  in  terms  of  stochastic  equicontinuity. 

Corollary  1  Under  assumptions  (A.1)-(A.6),  the  process  Hn  converges  weakly  to  a 
Gaussian  process  H  with  zero  mean  and  covariance  matrix 

E{H(r,  y)H{s,  z)'}  =  Q(iy1/2Q(r  A  s)Q(l)-^[F(z  Ay)-  F(z)F(y)}.        (7) 

Proof.  The  finite  dimensional  convergence  to  a  normal  distribution  follows  from  CLT 
for  martingale  differences.  This  together  with  Theorem  1  implies  that  Hn  converges 
weakly  to  some  process  H.  To  verify  the  covariance  matrix,  consider  the  expected 
value  of  Yn,  for  r  <  s  and  u  =  F{z)  <  v  =  F(y).  Using  double  expectation  and 
martingale  property,  we  obtain 

1       ([nT]         \ 
E{Yn(r,u)Y^s,v)}  =  -E  \Txtx'A  (u  -  uv)  (8) 

which  tends  to  Q(r)(u  -  uv).  From  (X'X/n)-'1/2  -^  Q(l)_1,  we  arrive  at  (7).       □ 

We  now  introduce  a  Brownian  bridge  type  process  which  is  closely  related  to 
tests  for  parameter  constancy  in  linear  regressions.  Let  Xk  =  (xi,  ...,Xk)',  X  = 
(xi,x2,...,xn)',  and 

Ak  =  [x'xy^ix'Mix'x)-1'2.  (9) 

The  matrix  A[ns]  converges  to  A(s)  =  Q(l)~1^2Q(s)Q(l)~1/2.  In  the  special  case  that 
Q(s)  =  sQ  for  some  positive  definite  matrix  Q,  A(s)  =  si,  where  /  is  a  p  x  p  identity 
matrix. 


Corollary  2    Under  the  assumptions  of  Corollary  1,  the  process  Vn  defined  as 

Vn(s,z)  =  Hn(s,z)  -  A[ns]Hn{l,z) 

converges  weakly  to  a  Gaussian  process  V  with  mean  zero  and  covariance  matrix 

E{V(r,y)V(s,z)'}  =  {A(r  As)-  A(r)A(s)}{F(y  A  z)  -  F(y)F(z)}.         (10) 

Proof.  The  stochastic  equicontinuity  of  Vn  follows  from  that  of  Hn  and  the  conver- 
gence of  A[ns]  to  a  deterministic  matrix  A(s)  uniformly  in  s.  The  limiting  process  of 
Vn  is,  by  Corollary  1, 

V(s,z)  =  H(s,z)-A(s)H(l,z). 

Now  (10)  follows  easily  from  (7).        □ 

As  noted  earlier,  when  Q(s)  =  sQ  for  some  Q  >  0,  A(s)  becomes  si  and  the 
covariance  matrix  of  V  becomes  (r  As  —  rs){F(z  Ay)  —  F(z)F(y)}I.  A  process  B(s,  u) 
is  said  to  be  a  two-parameter  Brownian  bridge  on  [0,  l]2  if  it  is  a  zero-mean  Gaussian 
process  with  covariance  function 

EB(r,  u)B(s,  v)  =  (r  A  s  —  rs)(u  A  v  —  uv). 

We  see  that  V(s,z)  has  the  same  distribution  as  B'(s,F(z)),  where  B*  is  a  vector  of 
p  independent  Brownian  bridges. 

4     An  Application  in  Structural  Change 

Consider  the  structural  change  model  (2).  The  objective  is  to  test  the  null  hypothesis 
Ho  '■  ft\  =  02  with  Hi  unknown.  There  is  a  rich  literature  on  the  problem,  for 
example,  Andrews  [4].  Here  we  construct  a  test  using  an  estimated  sequential  empirical 
process.  We  estimate  model  (1)  by  OLS  or  other  methods  and  compute  the  residuals 
by  it  =  yt  —  x't/3.  Define  the  p  x  1  vector  process  Tn, 

Tn(-,  z)  =  (X'X)-1'2  £  xtI(it  <z)-  Ak(X'X)-^2  JT  xtI(it  <z)  (11) 


and  the  test  statistic 

k 
Mn  -  max  sup  ||Tn(-,z)||oo 

k      z  n 

where  \\y\\<x>  =  max{|t/i|, ...,  \yp\},  the  maximum  norm.  The  process  Tn  takes  at  most 
n2  different  values,  so  the  maximum  value  always  exists.  The  actual  computation  of 
Mn  is  straightforward.  If  xt  contains  a  constant  regressor  (we  do  assume  this),  then 

(X'X)-1/2  £  xt  -  Ak(X'X)-1/2  £  xt  =  0    for  all  k 

t=i  t=i 

so  that  I(£t  <  z)  can  be  replaced  by  I(£t  <  z)  —  F(z)  without  changing  the  value  of 
Tn.  Therefore,  Tn  is  centered  (only  approximately  centered  because  F  is  not  the  d.f. 
of  e().  Recognizing  this,  we  see  that  Tn  is  the  same  as  Vn  defined  in  Corollary  2  except 
Tn  uses  estimated  residuals  while  Vn  uses  true  disturbances. 

Using  the  result  of  this  paper,  Bai  [7]  shows  that  if  the  residuals  are  obtained  from 
a  root-n  consistent  estimator  of  /?,  then 

Tn([^-,z)=>B*(s,F(z)) 

n 

where  B*  =  (Bi,B2,.-.,Bp)  is  a  vector  of  p  independent  two-parameter  Brownian 
bridges  defined  on  [0,  l]2.  A  similar  result  is  obtained  by  Bai  [6]  for  bounded  (non- 
weighted)  sequential  empirical  processes  based  on  ARMA  residuals.  By  the  continuous 
mapping  theorem, 

Mn — ►    max    ||2?*(5,u)||oo- 

0<s,u<l 

The  test  Mn  is  asymptotically  distribution  free.  Critical  values  are  tabulated  in  Bai 
[8].  It  is  interesting  to  realize  that  the  limiting  process  T„  does  not  depend  on  the 
estimated  parameters.  The  underlying  reason  is  that  the  process  Tn  consists  of  two 
terms.  The  estimation  effects  are  canceled  out  in  the  first  term  and  the  second  term. 
This  is  in  contrast  with  the  classical  goodness-of-fit  test  where  the  estimation  effect 
does  not  go  away,  see  Durbin  [13]  and  [14]. 

As  for  the  limiting  distribution  under  local  alternatives,  we  consider  a  single  equa- 
tion in  a  simultaneous  equations  system.  The  model  under  the  null  hypothesis  is  given 
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by  (5).  Suppose  the  alternative  hypothesis  postulates  that 

yt  =  z'tpt  +  et  (12) 

where  f3t  =  (3[l  +  A^g(t/n)]  with  g  a  vector-valued  function  defined  on  [0, 1]  and 
Riemann-Stieltjes  integrable.  Suppose  xt  is  a  vector  of  instrumental  variables.  For  sim- 
plicity, assume  in  (A. 3)  Q(s)  =  sQxx  for  some  Qxx  >  0.  Also  assume  plim^-  J2™{  xtz't  = 
sQxz,  a  p  x  r  matrix.  Then  Bai  [8]  shows  that 

Mn-±+  max  \\B*(s,u)  +  p(u)Q^2QxzG(s)\\00 

0<5,U<1 

where  p(u)  =  /(F-1^))  and  G(s)  =  J*  g(v)dv  -  sftg{v)dv.  Note  that  G(s)  =  0 
if  and  only  if  g  is  a  constant  vector,  implying  no  change  in  0t.  In  order  for  the 
test  to  have  non-trivial  local  power  for  "all"  non-constant  g's,  Qxz  must  have  a  full 
column  rank.  Otherwise,  there  exists  a  non-zero  G(s)  such  that  QxzG(s)  e  0  so 
that  Mn  will  have  the  same  limiting  distribution  under  both  the  null  and  alternative 
hypotheses.  In  summary,  when  valid  instruments  are  available,  Mn  can  be  used  to 
test  changes  in  a  simultaneous  equations  system  and  Mn  possesses  non-trivial  local 
power.  Regressors  themselves  constitute  valid  instruments  when  they  are  independent 
of  error  disturbances. 

The  same  test  can  also  detect  changes  in  the  following  type: 

Vt  =  x'tP  +  e't 

with  e*  having  a  continuous  d.f.  F  for  t  <  rii  and  a*  having  a  continuous  d.f.  G  for 
t  >  ni.  Bai  [8]  argues  that  even  if  the  two  distributions  have  the  same  mean  and  same 
variance,  as  long  as  F  ^  G,  it  is  detectable  by  Mn,  whereas  the  fluctuation,  CUSUM 
and  Wald  tests  may  fail  to  diagnose  this  kind  of  shift. 
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A      Appendix 

Lemma  A.l  Assume  the  conditions  of  Theorem  1  hold.  Then  there  exists  a  K  <  oo, 
such  that  for  all  Si  <  s2  and  u\  <  u2,  where  0  <  s,,  u,  <  1  (z  =  1,2) 

E\\Yn(s2,u2)  -  Yn(suu2)  -  Yn(s2,Ul)  +  r7l(Sl,u1)||2'Y 

<  K(u2  -  Ul)a{s2  -  Sl)a  +  n-^-^K(u2  -  Ul)(s2  -  si). 

Without  the  loss  of  generality,  one  can  assume  that  a  <  7,  since  \u2  —  Ui\  <  1  and 
I-52  —  Si  I  <  1.  Moreover,  when 

Tn-(7-i)/2M)<U2_Ul       and    rn-(-y-D/2(Q-i)<S2_Si  (13) 

for  r  >  0,  the  lemma  implies 

E\\Yn(s2,u2)-Yn(s1,u2)-Yn(s2,u1)  +  Yn(suUl)\\^ 

<  K[\  +  t-21*-V](u3  -  Ul)a(s2  -  Sl)Q.  (14) 

This  inequality  is  analogous  to  (22.15)  of  Billingsley  ([10],  p.  198). 
Proof.  Write  r\t  =  I(ui  <  Ut  <  u2)  -  u2  +  ui  and  Y*  =  Yn(s2,u2)  -  Yn(si,u2)  - 
Yn(s2,ui)  +  Yn(si,ui)  for  the  moment.  Then  Y*  =  n~xl2Y^i<t<i^tT}t  with  i  =  [ns\] 
and  j  =  [ns2].  Note  that  {xtT}t^t)  is  a  sequence  of  (nonstationary)  vector  martin- 
gale differences,  where  Tt  is  the  u-field  generated  by  ...,xt,xt+\]  ...,Ut-i:Ut.  By  the 
inequality  of  Rosenthal  (Hall  and  Heyde  [15]  p.  23),  there  exists  a  constant  M  <  00 
only  depending  on  7  and  p  such  that 


e\\y:\F  =  E I  ( U  £  xtnt]'  £  xhVh)  ) 


<ME[-J2  ^{(zJxO^t-i})    +Mn~T  £  E{(x'txtr^}.   (15) 

\n  i<t<j  j  i<t<j 

Note  that  xt  is  measurable  with  respect  to  Tt-\  and  r\t  is  independent  of  Tt-\.    In 
addition,  Erft  <u2  —  u\  and  En^  <  u2  —  U\.  These  results  together  with  assumption 
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(A. 6)  provide  bounds  for  the  two  terms  on  the  right  of  (15).  The  first  term  is  bounded 

by 

M(u2  -  UiyE  I-  £  (x'txt) )     <  MK(u2  -  Uly(s2  -  Sl)a 
and  the  second  term  is  bounded  by 

Mn-^-x\u2  -  ux)-  T  E{x'xtV  <MKn~('1-x){u2-ul)(s2-sl). 

Renaming  MK  as  K,  the  lemma  follows  from  (u2  —  ui)7  <  (u2  —  Ui)a,  for  7  >  a. 

Lemma  A. 2    Under  (A. 5),  we  have  for  s\  <  s  <  s2  and  u\  <  u  <  u2, 

\\Yn(s,u)  -  J;(«i,ui)||  <  \\Yn(s2,u2)  -  rn(Sl,Ul)||  +  Op(l)n1/2[(u2  -  ux)  +  (s2  -  Sl)] 

where  the  term  Op(l)  is  uniform  in  s  (s  >  Si),  does  not  depend  on  u  and  u\  and 
satisfies 

P(\0P(1)\  >  C)  <  M/C2(1+p),       VC>0,     for  some  p  >  0. 

Proof.  First  notice  that  all  of  the  components  of  xt  can  be  assumed  to  be  nonnegative. 
Otherwise  write  xt  =  Y%=i  xt(i)~ Ef=i  x7(i)  where  xf(i)  =  (0,  ..0,it,-,0,  ...,0)'  \lxti  > 
0  and  x^(i)  =  (0,  ..0,  —  xtt,  0, ...,  0)'  if  x(1  <  0.  In  this  way,  Yn  can  be  written  as  a  linear 
combination  (with  coefficients  1  or  -1)  of  at  most  2p  processes  with  each  process  having 
nonnegative  weighting  vectors.  In  addition,  ||x*(i)||  <  \\xt\\  and  ||xt-(0ll  —  llx*ll-  So 
assumptions  (A. 5)  and  (A. 6)  are  satisfied  for  xf{i)  and  xj{i).  It  is  thus  enough  to 
assume  that  the  xt  are  nonnegative.  A  new  piece  of  notation,  for  vectors  a  and  6, 
take  a  <  b  to  mean  a,  <  6,  for  all  components.  Since  xt  >  0,  the  vector  functions 
xtI(U  <  u)  and  xtu  are  nondecreasing  in  u.  It  is  easy  to  show 


Yn(s,u)  -Tn(51,ui)  <  Yn(s2,u2)  -Yn(s1,ui) 

1     [ns]  /  1      [nsj] 

+n1/2(-E^)("2-u)  +  n1/2     -   ^  xt{I(Ut<u2)-u2} 
and 

1    ["«]  /  1        M 

Yn(Sl,Ul)-Yn(s,u)<nV2(-J2xt)(u-Ui)  +  n1/2     -    £    x/{7(f/t  <  u)  -  Ul} 

13 


The  lemma  follows  from  the  boundedness  of  the  indicator  function  and  (A. 5).       □ 

Proof  of  Theorem  1.  We  shall  evaluate  directly  the  modulus  of  continuity.  Define 

u5(Yn)  =  sup{\\Yn(s',u')-Yn(s",u")\\;  \s'-s"\  <  8,  \u'-u"\  <  8,s',s",u',u"  €  [0, 1]}. 

We  shall  show  that  for  any  e  >  0  and  rj  >  0,  there  exist  a  8  >  0  and  an  integer  n0, 
such  that 

P{us(Yn)  >  e)  <  77,        n  >  n0. 

Since  [0,  l]2  has  only  about  8~2  squares  with  side  length  8,  it  suffices  to  show  that  for 
every  point  (si, Ui)  £  [0,  l]2,  every  e  >  0  and  77  >  0,  there  exist  a  8  £  (0,1)  and  an 
integer  no  such  that 

P(sup\\Yn(s,u)-Yn(s1,u1)\\  >5e)  <282r),         n  >  n0.  (16) 

(6) 

where  (8)  =  {(s,  u);  Si  <  s  <  Si  +  8,  ux  <  u  <  U\  +  8}  f]  [0,  l]2. 

For  a  given  8  >  0  and  77  >  0,  choose  C  large  enough  so  the  Op{\)  in  Lemma  A. 2 

satisfies 

P(|Op(l)|  >  C)  <  8\  (17) 

By  Lemma  A.2  (see  also  (22.18)  of  Billingsley  [10],  p.  199),  when  |Op(l)|  <  C, 
sup||rn(5,ii)  -yn(-si,^i)||  <  3    max    ||y„(.si  +ien,ui  +  jcn)  -  3^(si,ui):||  +  2e 

(S)  l<i,]<m 

where  en  =  e/(n^2C)  and  m  =  [n^2C8/e]  +  1.  Write 

X(iJ)  =  Yn(si  +ien,u1  +  jen)  -Yn(si,Ui). 
Then 
P(sup  ||yB(a,«)-yB(ai,u1)||  >  5e)  <  P(|Op(l)|  >  C)+P(  max    \\X(iJ)\\  >  e).  (18) 

(5)  l<i,j<m 

Now  for  fixed  i  and  k  (i  >  k)  write  Z(j)  =  X(i,j)  —  X(k,j).  Notice  that 
(e/C)n-W(a-V  <  el{Cn"2)  =  en  <  jen,         j  >  1, 
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which  follows  from  n-(T-i)/2(«-i)  <  n-i/2  because  1  <  a  <  7.  By  (13)  and  (14), 

E\\Z(j)  -  Z(/)||2^  <  KCe[{i  -  k)tn}a[(j  -  l)en}a,  1  <  /  <  j  <  m 

where,  from  (14)  with  r  =  e/C, 

Ce  =  [1  +  (Clt)2(a-l)]  <  2(C/e)2(Q'1)  for  small  e.  (19) 

Thus  by  Theorem  12.2  of  Billingsley  ([10],  p.  94),  we  have 

P(  max  \\Z(j)\\  >e)<  ^%  -  k)en}°(men)°  <  ^[(i  -  k)tn}°8°         (20) 

where  K\  is  a  generic  constant  and  Ki  =  2aK\K.   The  last  inequality  follows  from 
(men)  <  28  for  large  n.  Because 


max||X(z,;)||  -  max  \\X(k,  j] 
1  : 


<max\\X(i,j)-X(kJ)\\=max\\ZU)l 


if  we  let  V(i)  —  maxj  ||X(t,y)||,  then  (20)  implies 

P(\V(i)  -  V{k)\  >e)<  ^^[(i  -  k)tn]a6a,        l<k<i<m. 

Thus  by  Theorem  12.2  of  Billingsley  once  again  [let  £h  =  V(h)  —  V(h  —  1),  so  that 
V(i)  is  the  partial  sum  5,  of  random  variables  £;,  in  Billingsley's  notation],  we  obtain 

P(  max  \V(i)\  >  e)  <  S^me,)"^  <  ^*> 

l<t<m  (_    '  6  ~ 

where  K[  is  a  generic  constant  and  K$  =  2°K[K2-  Note  that  max,  \V(i)\  =max,  maxj  \\X(i,  j] 
Thus  by  (18) 


K'zCt  z 


P(sup  \\Yn(s,u)  -  Yn{s1,u1)\\  >  be)  <  8\  +  -Ar~L62 


By  (19),  the  second  term  on  the  right  hand  side  above  is  bounded  by 

^1S2°  <  82     ™\AC8f^\  (21) 

By  Lemma  A. 2,  one  can  choose  C  =  (M/t))2(1+p'>8~^t+p'  to  assure  (17)  and  the  left 
hand  side  (21)  becomes  K(e,  T])8a,  where  K(e,  77)  is  a  constant  and  a  =  2^~1>p  >  0.  By 
choose  8  such  that  K(e,  r))8a  <  r),  (16)  follows.  The  proof  of  the  theorem  is  completed. 
D 
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